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Abstract. Critical decisions frequently rely on high-dimensional output from 
complex computer simulation models that show intricate cross-variable, 
spatial and temporal dependence structures, with weather and climate pre- 
dictions being key examples. There is a strongly increasing recognition of 
the need for uncertainty quantification in such settings, for which we pro- 
pose and review a general multi-stage procedure called ensemble copula 
coupling (ECC), proceeding as follows. 

1. Generate a raw ensemble, consisting of multiple runs of the computer 
model that differ in the inputs or model parameters in suitable ways. 

2. Apply statistical postprocessing techniques, such as Bayesian model 
averaging or nonhomogeneous regression, to correct for systematic errors 
in the raw ensemble, to obtain calibrated and sharp predictive distributions 
for each univariate output variable individually. 

3. Draw a sample from each postprocessed predictive distribution. 

4. Rearrange the sampled values in the rank order structure of the raw 
ensemble, to obtain the ECC postprocessed ensemble. 

The use of ensembles and statistical postprocessing have become routine 
in weather forecasting over the past decade. We show that seemingly un- 
related, recent advances can be interpreted, fused and consolidated within 
the framework of ECC, the common thread being the adoption of the em- 
pirical copula of the raw ensemble. Depending on the use of Quantiles, 
Random draws or Transformations at the sampling stage, we distinguish 
the ECC-Q, ECC-R and ECC-T variants, respectively. We also describe re- 
lations to the Schaake shuffle and extant copula based techniques. In a case 
study, the ECC approach is applied to predictions of temperature, pressure, 
precipitation and wind over Germany, based on the 50-member European 
Centre for Medium-Range Weather Forecasts (ECMWF) ensemble. 
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1. INTRODUCTION 

In a vast range of applications, critical decisions depend on the output of com- 
plex computer simulation models, with examples including weather and climate 
predictions and the management of floods, wildfires, air quality, and ground- 
water contaminations. There is a strongly increased recognition of the need for 
quantifying the uncertainty in the model output, as evidenced by the creation 
of pertinent American Statistical Association (ASA) and Society for Industrial 
and Applied Mathematics (SIAM) interest groups, and by the recent launch of 
the SIAM/ AS A Journal on Uncertainty Quantification. As SIAM President Nick 
Trefethen [2012] notes succinctly, 

"An answer that used to be a single number may now be a statistical distribution." 

Frequently, the goal is prediction, and we are witnessing a transdisciplinary 
change of paradigms in the transition from deterministic or point forecast to 
probabilistic or distributional forecasts [Gneiting, 2008]. The goal is to obtain 
calibrated and sharp, joint predictive distributions of future quantities of inter- 
est, from which any desired functionals, such as event probabilities, moments, 
quantiles and prediction intervals can be extracted, for a full quantification of 
the predictive uncertainty. While our data examples all concern weather fore- 
casting, where the recognition of the need for uncertainty quantification can be 
traced at least to Cooke [1906], the methods and principles we discuss apply in 
much broader contexts, both predictive and in other settings, where one seeks to 
quantify the uncertainty in our incomplete knowledge of current or past quantities 
and events. 

Focusing attention on the setting of our case study, accurate predictions of fu- 
ture weather are invaluable for society. Medium-range weather forecasts, with lead 
times up to two weeks, are obtained by numerically solving the partial differential 
equations that describe the physics of the atmosphere, with initial conditions pro- 
vided by estimates of the current state of the atmosphere [Kalnay, 2003] . In order 
to account for the uncertainties in the forecast, national and international mete- 
orological centers use ensembles of numerical weather prediction (NWP) model 
output, where the ensemble members differ in terms of the two major sources of 
uncertainty, namely, the initial conditions and the parameterization of the NWP 
model [Palmer, 2002, Gneiting and Raftery, 2005]. To give an example, Figures 
1 and 2 illustrate forecasts of surface temperature and six-hour precipitation 
accumulation over Germany issued by the European Centre for Medium-Range 
Weather Forecasts (ECMWF) as a part of its 50-member real-time ensemble, 
which operates at a horizontal resolution of approximately 32 km and lead times 
up to ten days [Molteni et al., 1996, Leutbecher and Palmer, 2008]. 

While the goal of NWP ensemble systems is to capture the inherent uncer- 
tainty in the prediction, they are subject to systematic errors, such as biases and 
dispersion errors. It is therefore common practice to statistically postprocess the 
output of NWP ensemble forecasts, with state of the art techniques including 
the ensemble Bayesian model averaging (BMA) approach developed by Raftery 
et al. [2005], and the non- homogeneous regression (NR) or ensemble model output 
statistics (EMOS) technique proposed by Gneiting et al. [2005]. 

To illustrate the idea, let y denote the weather quantity of interest, such as 
temperature at a specific location and look-ahead time, and write xi, . . . , xm for 
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Fig 1. 48-hour ahead ECWMF ensemble forecast for temperature over Germany valid 2:00 am 
on April 1, 2011, in the unit of degrees Celsius. Six randomly selected members are shown. The 
top left panel shows the locations of the three stations used in the subsequent case study. 




Fig 2. 24-hour ahead ECWMF ensemble forecast for six-hour precipitation accumulation over 
Germany valid 2:00 am on May 20, 2010, in the unit of millimeters. Six randomly selected 
members are shown. 
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Fig 3. 48-hour ahead BMA postprocessed predictive distributions for temperature in Berlin based 
on the 50-member ECMWF ensemble. The ensemble forecast is shown in red, the realizing ob- 
servation in blue. Left: Predictive density valid 2:00 am on April 1, 2011. Right: 10th, 50th and 
90th percentiles of the predictive distributions valid 2:00 am on April 1-14, 2011. 

the corresponding M ensemble member forecasts. The ensemble BMA approach 
employs mixture distributions of the general form 

M 

y\x!,...,X M ~ ^ w rnf{y\ 
m=l 

where the left-hand side refers to the conditional distribution given the ensemble 
member forecasts. Here /(y | x m ) denotes a parametric probability distribution or 
kernel that depends on the ensemble member forecast x m in suitable ways, with 
the mixture weights w\, . . . , wm reflecting the members' relative contributions to 
predictive skill over a training period. BMA postprocessed predictive distribu- 
tions based on the 50-member ECMWF ensemble are illustrated in Figure 3 for 
temperature, where the kernel is normal, and in Figure 4 for precipitation, where 
the kernel comprises a point mass at zero along with a power transformed gamma 
distribution for positive accumulations. 

In contrast, the NR predictive distribution is a single parametric distribution 
of the general form 

y\xi,...,x M ~ g(y\xi, . . . ,x M ), 

where g is a parametric distribution function with location, scale and shape pa- 
rameters depending on the ensemble values in suitable ways. For example, g could 
be normal with the mean an affme function of the ensemble member forecasts 
and the variance an affine function of the ensemble variance. 

Statistical postprocessing techniques such as ensemble BMA and NR have been 
shown to substantially improve the predictive skill of the NWP ensemble output 
[Wilks and Hamill, 2007]. Frequently, they apply to each weather variable at 
each location and each lead time individually, and thus they may fail to take 
cross- variable, spatial and temporal interactions properly into account. NWP 
models rely on discretizations of the equations that govern the physics of the 
atmosphere and multivariate dependence structures thus tend to be reasonably 
well represented in the raw ensemble system. However, these structures may fail 
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Fig 4. 24-hour ahead BMA postprocessed predictive distributions for precipitation in Frankfurt 
based on the 50-member ECMWF ensemble. The ensemble forecast is shown in red, the realizing 
observation in blue. Left: Mixed discrete-continuous predictive distribution valid 2:00 am on May 
20, 2010, comprising a point mass of 0.033 at zero, which is indicated by the respective black bar, 
and a density at positive accumulations, with mass 0.967. Right: 10th, 50th and 90th percentiles 
of the predictive distribution distributions valid 2:00 am on May 18-31, 2011. 

to be retained if the univariate margins are postprocessed individually. In low- 
dimensional or highly structured settings, parametric approaches to the modeling 
of multivariate dependence structures in the forecast errors are feasible, such as 
in the recent work of Pinson [2012], Schuhen et al. [2012] and Sloughter et al. 
[2013] on wind vectors, or in the approach of Gel et al. [2004] and Berrocal et al. 
[2007] that relies on geostatistical models in spatial settings. 

However, the statistical postprocessing of a full NWP ensemble forecast poses 
extremely high dimensional problems. For instance, we might be interested in five 
weather variables at 500 x 500 grid boxes, ten vertical levels and 72 lead times, 
a total of 900 million variables. While not all of them may need to be consid- 
ered simultaneously, critical applications, such as air traffic control [Chaloulos 
and Lygeros, 2007], air quality [Delle Monache et al., 2006] and flood manage- 
ment [Cloke and Pappenberger, 2009, Schaake et al., 2010], depend on physically 
realistic probabilistic forecasts of spatio-temporal weather trajectories and there- 
fore may entail much higher dimensions than can readily be incorporated into a 
parametric model. 

To address this challenge, we propose and review a general multi-stage pro- 
cedure called ensemble copula coupling (ECC), originally hinted at by Bremnes 
[2007] and Krzysztofowicz and Toth [2008], and recently investigated and de- 
veloped by Schefzik [2011]. The ECC approach allows for the multivariate rank 
dependence structure of the raw NWP ensemble to be preserved in the postpro- 
cessed ensemble, proceeding roughly as follows. 

Univariate postprocessing Apply statistical postprocessing techniques, such 
as ensemble BMA or NR, to obtain calibrated and sharp marginal predic- 
tive distributions for each weather variable, location and look- ahead time 
individually. 

Quantization Draw a discrete sample of the same size as the raw ensemble from 
each univariate, postprocessed predictive distribution. 
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(b) Individual BMA postprocessing 
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(c) ECC postprocessed ensemble 

Fig 5. 24-hour ahead ensemble forecasts of temperature and pressure at Berlin and Hamburg, 
valid 2:00 am on May 27, 2010. The units used are degrees Celsius and hPa. 
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Ensemble reordering Arrange the sampled values in the rank order structure 
of the raw ensemble, to obtain the ECC postprocessed ensemble. 

An illustration of the ECC approach is given in Figure 5, a dynamic version of 
which is available in the Online Supplement. Here, the setting is four-dimensional. 
We consider surface temperature and sea level pressure in Berlin and Hamburg, 
respectively. The scatterplot matrix in the top panel illustrates the 50-member 
ECMWF ensemble forecast at a 24 hours lead time. Clearly, there are dependen- 
cies between the margins; for example, there is a positive association between 
temperature in Berlin and temperature in Hamburg, and there are negative asso- 
ciations between temperature and pressure. The scatterplot matrix in the middle 
panel is constructed from samples of the individually BMA postprocessed predic- 
tive distributions. Here, the systematic errors in the margins have been corrected, 
at the cost of a loss of the error dependence structure. The bottom panel elu- 
cidates the effects of the ECC ensemble reordering; while the margins remain 
unchanged from the middle panel, the rank dependence structure of the raw 
ensemble is restored. 

Owing to the intuitive appeal and striking simplicity, which incurs essential 
no computational costs beyond the marginal postprocessing, approaches of ECC 
type are rapidly gaining prominence at weather centers worldwide, with variants 
recently having been implemented by Flowerdew [2012], Pinson [2012], Roulin 
and Vannitsem [2012] and Schuhen et al. [2012], among others. Our goal here is 
to interpret, fuse and consolidate these and other, seemingly unrelated advances 
within the framework of ECC. As we will demonstrate, the common thread of 
the approaches lies in the adoption of the empirical copula of the raw ensemble, 
thereby restoring its rank dependence structure and justifying the term ensemble 
copula coupling. 

The remainder of the paper is organized as follows. In Section 2 we review 
and discuss statistical postprocessing techniques for univariate NWP ensemble 
output. General copula approaches to the handling of multivariate output are 
discussed in Section 3, with subsequent focus on the ECC approach in Section 4, 
where we distinguish the ECC-Q, ECC-R and ECC-T variants, depending on the 
use of Quantiles, Random draws or Transformations at the quantization stage. 
Section 5 turns to a case study on probabilistic predictions of temperature, pres- 
sure, precipitation and wind over Germany, based on the ECMWF ensemble. The 
paper closes in Section 6, where we discuss benefits and limitations of the ECC 
approach and return to the general theme of uncertainty quantification for high- 
dimensional output from complex simulation models with intricate dependence 
structures. 

2. UNIVARIATE POSTPROCESSING: BAYESIAN MODEL AVERAGING 
(BMA) AND NONHOMOGENEOUS REGRESSION (NR) 

Following the pioneering work of Hamill and Colucci [1997], various types of 
statistical postprocessing techniques for the output of NWP ensemble forecasts 
have been developed, with Wilks and Hamill [2007], Brocker and Smith [2008], 
Schmeits and Kok [2010] and Ruiz and Saulo [2012] providing critical reviews. 
As noted, postprocessing aims to correct for biases and dispersion errors in the 
ensemble output, and state of the art techniques for doing this can roughly be 
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divided into mixture approaches, building on the ensemble Bayesian model aver- 
aging (BMA) approach of Raftery et al. [2005], and regression approaches, such as 
the nonhomogeneous regression (NR) method put forth by Gneiting et al. [2005] . 

Specifically, consider a univariate weather quantity of interest, y, and write 
x±, . . . ,xm for the corresponding M ensemble member forecasts. As noted, the 
ensemble BMA approach uses mixture distributions of the general form 



where the left-hand side refers to the conditional distribution of y given the en- 
semble member forecasts x±, . . . ,xm, and f(y \ x m ) is a parametric distribution 
that depends on x m only. The mixture weights w\, . . . ,w m are nonnegative and 
sum to 1; they reflect the corresponding member's relative contributions to pre- 
dictive skill over a training period. In contrast, the NR predictive distribution is 
a single parametric distribution of the general form 

(2.2) y\xi,...,XM ~ g(y\xi,...,x M ), 

where the right-hand side refers to a parametric family of probability distribu- 
tions, with the parameters depending on all ensemble members simultaneously. 

The particular choice of a parametric model for the BMA kernel / or the NR 
distribution g depends on the weather quantity at hand. Table 1 sketches ensem- 
ble BMA implementations for temperature and pressure [Raftery et al., 2005], 
where the kernel f(y \ x m ) is normal with mean ao m + a\ m x m and variance a^, 
precipitation [Sloughter et al., 2007], wind speed [Sloughter et al., 2010], wind 
direction [Bao et al., 2010], and visibility [Chmielecki and Raftery, 2011]. Fur- 
thermore, ensemble BMA implementations are available for fog [Roquelaure and 
Bergot, 2008], visibility and ceiling [Chmielecki and Raftery, 2011]. Frequently, 
the parameters in the specifications for the mean and the variance of the kernels 
are subject to constraints; for example, the variance parameters are often assumed 
to be constant across ensemble members. If the ensemble is generated in such a 
way that its members are statistically indistinguishable or exchangeable, as in 
the case of the ECMWF ensemble, the BMA weights as well as the BMA mean 
and variance parameters are assumed to be constant across ensemble members 
[Fraley et al., 2010]. Table 2 hints at NR implementations for temperature and 
pressure [Gneiting et al., 2005], where the postprocessed predictive distribution 
is normal with mean a + b\X\ + ■ ■ ■ + bM%M and variance c + dS 2 where S 2 is the 
ensemble variance, precipitation [Wilks, 2009], and wind speed [Thorarinsdottir 
and Gneiting, 2010, Thorarinsdottir and Johnson, 2012]. 

In the remainder of this section we provide a detailed description of the post- 
processing methods for the weather variables temperature, pressure, precipitation 
and wind which are analyzed in our case study. Generally, the ensemble BMA 
method is more flexible, while the NR technique is more parsimonious. In terms 
of the predictive performance, the general experience is that the BMA and NR 
approaches yield comparable results. Software for estimation and prediction is 
available in the form of the ensembleBMA and ensembleMOS packages in R. 1 

lr The ensembleBMA package has been developed and described by Fraley et al. [2011] and is 
available at www.r-project.org. The ensembleMOS package is maintained by Bobby Yuen and 
can be downloaded at www.stat.lsa.umich.edu/~bobyuen/ensembleMOS/index.html. 
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Table 1 

Ensemble BMA implementations for univariate weather quantities. In the case of precipitation 
amount, we refer to y 1 ^ G R + , because the gamma kernels apply to cube root transformed 
precipitation accumulations. In the case of wind direction, § denotes the circle, z m is a 
bias- corrected ensemble member value on the circle, and concentration parameter, for 

m = l,...,M. 
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Table 2 

V7? implementations for univariate weather quantities 
Weather Quantity Range Distribution (g) 



Temperature 
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Normal 




Pressure 
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Precipitation amount 
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Truncated 
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Wind components 
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2.1 Temperature and pressure 

For the weather variables temperature and pressure, Raftery et al. [2005] pro- 
pose the ensemble BMA specification 

M 

(2.3) y\x 1} ...,x M ~ ^ w m N{a 

m=l 

where A/"(/u, a 2 ) denotes a normal distribution with mean \i and variance a 2 . The 
BMA weights wi, . . . , wm, the mean parameters a%, . . . , ajf and &i , . . . , 6m, and 
the variance parameters erf, . . . ,crj^, which in the standard implementation are 
assumed to be constant across ensemble members, are estimated on training data. 
This type of mixture approach has been applied successfully at weather centers 
worldwide, 2 and we give an example in Figure 3. 

Gneiting et al. [2005] propose an NR approach for temperature and pressure, 
in which the predictive distribution is normal, 

(2.4) y\xi,. . . ,xm ~ AA(a + bix\ H h bu^M, c + dS 2 ), 

where S* 2 = Ylm=i( x m ~ x) 2 /M denotes the ensemble variance. If the ensemble 
members are exchangeable, it needs to be assumed that b\ = ••• = bu- This 
approach also has been applied at weather centers internationally, as exemplified 
in the work of Hagedorn et al. [2008] and Kann et al. [2009]. 

2 A real-time ensemble BMA implementation for predictions of temperature and precipitation 
over the Pacific Northwest region of the United States is available to the general public at 
www.probcast.com, based on the University of Washington mesoscale ensemble in the form 
described by Eckel and Mass [2005]. 



10 



R. SCHEFZIK, T. L. THORARINSDOTTIR AND T. GNEITING 



2.2 Precipitation 

While of critical applied importance, probabilistic forecasts for quantitative 
precipitation pose technical challenges, in that the predictive distribution is mixed 
discrete-continuous, comprising both a point mass at zero and a density on the 
positive real axis, which might be considerably skewed. 

Sloughter et al. [2007] propose an ensemble BMA model of the general form 
(2.1) for precipitation accumulation, where the kernel f(y\ Bernoulli- 
Gamma mixture. The Bernoulli component provides a point mass at zero via a 
logistic regression link, in that 

(2.5) logit f[y = | x m ] = log ^ ° j Xm j = a m + f3 m xll 3 + 7 m 5 m , 

J [y > U | x m \ 

where 5 m equals 1 if x m = and equals otherwise. The continuous part of the 
kernel is a gamma distribution in terms of the cube root transformation, y 1 / 3 , of 
the precipitation accumulation, so that 

(2.6) /(y 1 / 3 | x m ) = f[y = 0\ x m ]l {y=0} +f[y>0\ x m ) ^(y 1 / 3 | x m ) t {y>0} , 
where h denotes a gamma distribution with mean fi m and variance a m , where 

(2.7) f-L m — a m + b m x,J L and cr m — c m -\- d m x m . 

Figure 4 shows an example of the resulting BMA postprocessed predictive distri- 
bution in terms of the non-transformed precipitation accumulation, y. 

Turning to the NR approach, we follow Roulin and Vannitsem [2012] and 
interpret the logistic regression technique of Wilks [2009] in this setting. To put 
the method into context, forecasts for the probability of the precipitation amount 
exceeding a certain threshold have commonly been obtained using either quantile 
regression [Bremnes, 2004] or logistic regression [Wilks and Hamill, 2007, Hamill 
et al., 2008]. If a full predictive distribution is sought such methods frequently fail, 
as they typically are inconsistent across thresholds, violating the monotonicity 
constraint for cumulative distribution functions. Wilks [2009] proposes an elegant 
remedy for the logistic regression approach. In his method, the postprocessed 
predictive cumulative distribution function takes the form 

, n c n , x exp(a + b 1 x 1 -\ h b m x M + h(y)) 

(2.8) G(y\x 1 ,...,x M ) = — r — t — r , vr , 

1 + exp(a + bxxi H h b m XM + 

where h grows strictly monotonically and without bounds as a function of the 
precipitation accumulation y > 0. Linear choices for h result in mixtures of a 
point mass at zero and a truncated logistic distribution. 

2.3 Wind 

A wind vector can be represented by wind speed and wind direction, or by its u 
(zonal or west-east) and v (meridional or north-south) velocity components. Wind 
speed is a nonnegative continuous variable. Sloughter et al. [2010] provide an en- 
semble BMA implementation, where the kernel is a gamma distribution with the 
mean and the variance being affme functions of the respective ensemble member 
forecast. Thorarinsdottir and Gneiting [2010] and Thorarinsdottir and Johnson 
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Fig 6. 24-hour ahead NR postprocessed predictive distributions for the u wind component at 
Hamburg based on the 50-member ECMWF ensemble. The ensemble forecast is shown in red, 
the realizing observation in blue. Left: Predictive density valid 2:00 am on April 1, 2011. Right: 
10th, 50th and 90th percentiles of the predictive distributions valid 2:00 am on April 1-14, 2011. 



[2012] develop an NR approach in which the predictive distribution is truncated 
normal. Wind direction is a circular quantity and Bao et al. [2010] propose an 
ensemble BMA specification where the kernel is a von Mises distribution. 

When a wind vector is represented by its u and v components, the methods 
described in Section 2.1 for temperature and pressure become available, and ex- 
ample of NR postprocessed predictive distributions of the form (2.4) for the u 
component are shown in Figure 6. In recent work, truly bivariate postprocessing 
techniques for wind vectors have become available, taking dependencies between 
the components into account [Pinson, 2012, Schuhen et al., 2012, Sloughter et al., 
2013]. These methods are discussed in subsequent sections. 

2.4 Estimation 

Ensemble postprocessing techniques depend on the availability of training data 
for estimating the predictive model. Typically, optimum score approaches have 
been used for estimation [Gneiting et al., 2005], with the maximum likelihood 
technique being a special case thereof [Gneiting and Raftery, 2007] , and Bayesian 
approaches offering alternatives [Di Narzo and Cocchi, 2010]. 

The training data are usually taken from a rolling training period consisting 
of the recent past, including the most recent available ensemble forecasts along 
with the corresponding realizing values. Common choices for the length of the 
training period range from 20 to 40 days. In schemes of this type, the training 
set is updated continually, thereby allowing the estimates to adapt to changes in 
the seasons and weather regimes. Clearly, there is a trade-off here, in that larger 
training periods may allow for better estimation in principle, thereby reducing 
estimation variances, but may introduce biases due to seasonal effects. More 
flexible, adaptive estimation approaches, such as recursive maximum likelihood 
techniques, have been proposed and studied by Pinson et al. [2009], Raftery et al. 
[2010] and Pinson [2012]. 

In addition to deciding on the temporal extent of training sets, choices re- 
garding their spatial composition are to be made. Local approaches use training 
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data from the station location or grid box at hand only, resulting in distinct sets 
of coefficients that are tailored to the local terrain, while regional approaches 
composite training sets spatially, to estimate a single set of coefficients that is 
then used over an entire region [Thorarinsdottir and Gneiting, 2010]. Recently, 
flexible spatially adaptive approaches have been developed that estimate coeffi- 
cients at each station location individually, interpolating them to sites where no 
observational assets are available [Kleiber et al., 2011a, b]. 

Introduced by Hamill et al. [2006], reforecasts are retrospective weather fore- 
casts with today's NWP models applied to past initialization and valid dates. As 
reforecasts are based on the model version that is currently run operationally, the 
availability of reforecast datasets results in massive enlargements of training sets 
for statistical postprocessing. The ensuing gains in the predictive performance can 
be substantial, as demonstrated by Hagedorn et al. [2008], Hamill et al. [2008] 
and Hagedorn et al. [2012], among others. 

3. FROM UNIVARIATE TO MULTIVARIATE PREDICTIVE 
DISTRIBUTIONS: COPULA APPROACHES 

The univariate postprocessing methods discussed thus far yield significant im- 
provement in the predictive performance of raw NWP ensemble output. However, 
in many applications it is critical that multivariate dependencies in the forecast 
error, including the case of temporal, spatial and spatio-temporal weather tra- 
jectories, are accounted for. For example, winter road maintenance requires joint 
probabilistic forecasts of temperature and precipitation [Berrocal et al., 2010], 
air traffic control calls for probabilistic forecasts of wind fields [Chaloulos and 
Lygeros, 2007], the management of renewable energy resources hinges on spatio- 
temporal weather trajectories [Pinson, 2013], and NWP output is used to drive 
hydrologic models, to address tasks such as flood warnings, the operation of wa- 
terways, and releases from reservoirs, with Schaake et al. [2010, pp. 61-62] noting 
in this context that 

"[. . .] relationships between physical variables like, e.g. precipitation and tempera- 
ture should be respected." 

If statistical postprocessing proceeds independently for each weather variable, 
location and look-ahead time, such relationships are ignored, and it is critical 
that they be restored. 

3.1 Handling dependencies: Sklar's theorem 

Taking a technical perspective momentarily, suppose that we have a postpro- 
cessed predictive cumulative distribution function, Fi, for each univariate weather 
quantity Y/, where I = 1,...,L, with the multi-index I = (i,j,k) referring to 
weather variable i, location j and look-ahead time k. What we seek is a phys- 
ically realistic multivariate joint predictive cumulative distribution function F 
with margins F\ , . . . , F^. 

Recall that a copula is a multivariate cumulative distribution function with 
standard uniform margins [Joe, 1997, Nelsen, 2006]. Copulas have been employed 
successfully in a wealth of applications, such as in finance [McNeil et al., 2005], 
hydrology [Genest and Favre, 2007], and climatology [Schoelzel and Friederichs, 
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2008], to name but a few. Their relevance stems from the following celebrated 
theorem of Sklar [1959]. 

Theorem 3.1 (Sklar). For any multivariate cumulative distribution function 
F with margins F±, . . . ,Fl there exists a copula C such that 

(3.1) F(jn, ...,y L ) = C(F l {y 1 ), . . .,F L (y L )) 

for yi, . . . , yjj € R. Furthermore, C is unique on the range of the margins. 

In particular, Sklar's theorem demonstrates that univariate approaches to the 
statistical postprocessing of ensemble output can accommodate any type of joint 
dependence structure, provided that a suitable copula function is specified. As 
copula methods allow for the modeling of the marginal distributions and of the 
multivariate dependence structure, as embodied by the copula, to be decoupled, 
they are ideally suited to our problem. 

3.2 Gaussian and other parametric copula approaches 

If the dimension L of the output quantity is small, or if specific structure can be 
exploited, such as in spatial or temporal settings, parametric or semiparametric 
families of copulas can be employed. 

The most common parametric approaches invoke a Gaussian copula framework, 
under which the multivariate cumulative distribution function F is of the form 

(3.2) C(y l7 ...,y L \E) = ^(fc-^fcft)), . . . , $-\F L (y L )) | £), 

where $>l( • I E) is the cumulative distribution function of an L-variate normal 
distribution with mean zero and correlation matrix £, and is the quantile 
function of the univariate standard normal distribution. The use of Gaussian 
copulas makes for a particularly tractable approach as only the correlation matrix 
E needs to be modeled. In a recent paper, Moller et al. [2013] propose the use 
of Gaussian copulas to recover the cross- variable dependence structure for multi- 
variable forecasts at individual locations, where the ensemble BMA methodology 
is used to obtain the postprocessed marginal predictive distributions. The method 
is straightforward except that precipitation requires special treatment due to the 
mixed discrete-continuous nature of the variable. The recent work of Pinson [2012] 
and Schuhen et al. [2012] on bivariate wind vectors invokes multivariate normal 
predictive distributions, corresponding to the special case in (3.2) in which the 
the margins F\, . . . , Fl are normal. 

The use of Gaussian copula methods has a long and well established tradition 
in geostatistics, where the approach is referred to as anamorphosis; see Chiles and 
Delfiner [2012] and the references therein. In the spatial setting, the correlation 
matrix E in (3.2) is taken to be highly structured, satisfying assumptions such as 
spatial stationarity or isotropy, as exemplified by Gel et al. [2004] and Berrocal 
et al. [2007, 2008] in ensemble BMA approaches to temperature and precipitation 
field forecasting. Similarly, Gaussian copulas have been employed to capture de- 
pendencies over consecutive lead times in postprocessed predictive distributions 
[Pinson et al., 2009, Schoelzel and Hense, 2011]. When the margins F\, . . . , Fl are 
normal, the underlying stochastic model is that of a Gaussian process or Gaus- 
sian random field, and choices in the parameterization of the correlation matrix E 
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correspond to the selection of a parametric correlation model in spatial statistics 
[Stein, 1999, Cressie and Wikle, 2011]. 

While Gaussian copulas yield convenient, ubiquitous stochastic models, para- 
metric or semiparametric alternatives are available, including but not limited to 
the use of elliptical copulas [Demarta and McNeil, 2005], Archimedian copulas 
[McNeil and Neslehova, 2009], extremal copulas [Davison et al., 2012], and pair 
copulas [Aas et al., 2009]. 

3.3 Empirical copulas 

In the common case in which the dimension L of the output quantity is huge 
and no specific structure can be exploited, parametric methods are bound to fail. 
We then need to resort to nonparametric approaches that depend on the use of 
empirical copulas. Here, let {(a^u • • • > x m) '■ m = 1,---,M} denote a data set 
of size M with values in M L . Assuming for simplicity that there are no ties, let 
xk(x l m ) denote the rank of x l m within x\, . . . , x l M . The corresponding empirical 
copula Em is defined as 

/ ' ■ \ 1 M 

(3-3) E M (±, . . . , g = - £ l{rk(4j <i 1: ..., rk(a&) < * L } 

^ ' m=l 

for integers < ii,...,iL < M, with suitable adaptations in the case of ties; 
see Deheuvels [1979], who uses the term empirical dependence function, and 
Riischendorf [2009] and the references therein. 

Any empirical copula is an irreducible discrete copula in the sense described by 
Kolesarova et al. [2006], with Mayor et al. [2007] providing a bivariate version of 
Sklar's theorem in this setting. As we will illustrate below, empirical copulas can 
be thought of as corresponding to Latin hypersquares. Asymptotic theory for the 
respective empirical processes has been developed by Riischendorf [1976], Stute 
[1984], van der Vaart and Wellner [1996], Fermanian et al. [2004] and Riischendorf 
[2009], among other authors. 

In the context of nonparametric approaches to the statistical postprocessing of 
multivariate NWP ensemble output, empirical copulas allow for the adoption of 
multivariate dependence order structure either from historical weather observa- 
tions, as in the Schaake shuffle technique of Clark et al. [2004], or directly from 
the ensemble forecast, to be discussed in detail in Section 4. 

3.4 The Schaake shuffle 

Clark et al. [2004] introduced the ingenious Schaake shuffle as a method for 
reconstructing physically realistic spatio-temporal structure in forecasted temper- 
ature and precipitation fields. Even though it has been presented as a reorder- 
ing technique in the extant literature, an empirical copula interpretation of the 
Schaake shuffle is readily available. 

Consider an output quantity taking values in H L and suppose that we have 
univariate postprocessed predictive distributions iq, . . . , Fi for the margins. Sup- 
pose furthermore that we have a set of M historical weather field observations 
for the IR^-valued output quantity at hand. From the historical record, we can 
construct an empirical copula of the form (3.3), as illustrated in the right-hand 
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Fig 7. The bivariate empirical distribution of the observed u and v wind components at Hamburg 
at 2:00 am on April 1-20, 2011. Left: Bivariate scatterplot. Middle: Representation of the rank 
dependence structure by a Latin square. Right: Empirical copula. 

panel of Figure 7, where we merely have L = 2 as corresponds to the components 
of a wind vector and M = 20. 

To apply the Schaake shuffle, we take a discrete sample of size M from each 
of the univariate postprocessed predictive distributions F%, . . . , Fl, and then we 
reorder to match with the rank order structure in the historical record, which is 
also of size M. This procedure corresponds to the application of the empirical 
copula of the historical weather field record to the discrete samples from the 
univariate postprocessed predictive distribution, and in this sense it is natural to 
consider the Schaake shuffle as an empirical copula technique. The thus reordered 
forecast inherits the multivariate rank dependence structure and the pairwise 
Spearman rank correlation coefficients from the historical weather record at hand. 
A more technical discussion can be given in close analogy to what we describe in 
Section 4.2 within the related context of the ensemble copula coupling approach. 

The Schaake shuffle has met great success in meteorological and hydrologic 
applications, where it recovers observed spatial and cross- variable dependence 
structures as well as temporal persistence [Clark et al., 2004, Schaake et al., 
2007, Voisin et al., 2011]. Nevertheless, there is a major limitation, in that the 
standard implementation fails to condition the multivariate dependence struc- 
ture on current or predicted atmospheric conditions. Clark et al. [2004, p. 260] 
therefore describe a future extension of the Schaake shuffle the idea of which is 

"to preferentially select dates from the historical record that resemble forecasted 
atmospheric conditions and use the spatial correlation structure from this subset of 
dates to reconstruct the spatial variability for a specific forecast." 

In what follows we pursue a related empirical copula approach, in which the 
postprocessed forecast inherits the multivariate dependence structure from the 
raw NWP ensemble, rather than from a historical record of weather observations, 
thereby addressing the lack of atmospheric flow and time dependence in the 
standard Schaake shuffle. 

4. ENSEMBLE COPULA COUPLING (ECC) 

The ensemble copula coupling (ECC) approach draws on the rank order infor- 
mation available in the raw ensemble forecast, based on the implicit assumptions 
that its members are exchangeable, and that the NWP ensemble is capable of 



16 



R. SCHEFZIK, T. L. THORARINSDOTTIR AND T. GNEITING 



1021 1023 




1013 1015 1017 101 



011 1013 1015 1017 



1005 1015 1025 1035 




1010 1020 1030 1040 



010 1020 1030 1040 



Fig 8. Scatterplot matrices for pressure at Berlin, Frankfurt and Hamburg. Left: 48-hour ahead 
ECMWF ensemble forecast valid 2:00 am on April 1, 2011. Right: Empirical distribution of the 
pressure observations at the same hour over the period March 1-31, 2011. 



representing observed cross- variable, spatial and temporal dependence structures. 
While the latter is to be expected, given that NWP models discretize the equa- 
tions that govern the physics of the atmosphere, diagnostic checks are advisable. 
We give an illustration of this in Figure 8, where the dependence structures within 
the ensemble and in the observational record strongly resemble each other. 

4.1 The ECC approach 

Taking a technical perspective for the moment, the ECC approach is a general 
multi-stage procedure for the generation of a postprocessed ensemble of the same 
size, M, as the raw ensemble. We write x\, . . . , x l M for the univariate margins of 
the raw ensemble, where the multi-index I = (i,j,k) refers to weather variable 
i £ {1, . . . , /}, location j E {1, . . . , J} and lead time k £ {1, . . . , K}, to comprise 
NWP output in ~R L , where the dimension is L = / x J x K. In order to generate 
an ECC postprocessed ensemble forecast we proceed as follows. 

Univariate postprocessing For each margin /, obtain a postprocessed predic- 
tive distribution, F;, by applying a univariate postprocessing technique, 
such as ensemble BMA or NR, to the raw ensemble output 

(4.1) x[,...,x l M . 

Quantization Represent each univariate predictive distribution Fi by a discrete 
sample of size M, say 

(4.2) x[,...,x l M . 

The discrete sample can be generated in various ways, to be discussed in 
detail in Section 4.3, where we distinguish the ECC-Q, ECC-R and ECC-T 
variants, depending on how the quantization is performed. 
Ensemble reordering For each margin I, the order statistics of the raw ensem- 
ble values, 

x (l) - " " " - X (M)' 
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Fig 9. The bivariate empirical distribution corresponding to the 24-hour ahead ECMWF ensem- 
ble forecast for the u and v wind vector components at Hamburg at 2:00 am on April 11, 2011. 
Left: Bivariate scatterplot for the 50-member ECMWF ensemble. Middle: Representation of the 
rank dependence structure by a Latin square. Right: Empirical copula. 



induce a permutation 07 of the integers {1, . . . , M}, denned by 07 (m) = 
rk(xj„) for m = 1, . . . , M. If there are ties among the ensemble values, the 
corresponding ranks can be allocated at random. The respective margin of 
the ECC postprocessed ensemble is then given by 

The ECC approach is attractive computationally, in that the modeling of the 
multivariate dependence structure requires only the calculation of marginal ranks. 
In the recent literature, the approach has been introduced as a reordering tech- 
nique, as described colorfully by Flowerdew [2012, p. 15]: 

"The key to preserving spatial, temporal and inter-variable structure is how this 
set of values is distributed between ensemble members. One can always construct 
ensemble members by sampling from the calibrated PDF, but this alone would 
produce spatially noisy fields lacking the correct correlations. Instead, the values 
are assigned to ensemble members in the same order as the values from the raw 
ensemble: the member with the locally highest rainfall remains locally highest, but 
with a calibrated rainfall magnitude." 

That said, it is fruitful to interpret the ECC approach as a nonparametric copula 
technique, which permits us to fuse and consolidate seemingly unrelated, recent 
advances within a single, structured framework. 

4.2 Empirical copula interpretation 

Elaborating on our interpretation of the Schaake shuffle, we now demonstrate 
that the ECC approach can be considered an empirical copula technique. For 
convenience, we assume that there are no ties among the raw ensemble mar- 
gins. We write R%, . . . , Rl for the corresponding marginal empirical cumulative 
distribution functions, which take values in the set 

hi = {0, ^,...,^^,1}. 

The multivariate empirical cumulative distribution function R : M L — > Im of the 
raw ensemble maps into Im, too. According to the discrete version of Sklar's 
theorem described by Mayor et al. [2007] in the bivariate case, there exists a 



18 



R. SCHEFZIK, T. L. THORARINSDOTTIR AND T. GNEITING 



uniquely determined empirical copula Em '■ ijfa ~* such that 

(4.4) R(y h ...,y L ) = E M (Ri(yi), ■ Rl(vl)) 

for all yi, . . . ,ul G R. A simple example in which L = 2 as corresponds to the 
components of a wind vector and M = 50 is given in Figure 9. 

Analogous considerations apply to the quantized independently postprocessed 
ensemble (4.2) and the ECC postprocessed ensemble (4.3). Using obvious nota- 
tion, we write F and F for the corresponding multivariate empirical cumulative 
distribution functions. Furthermore, we denote the marginal empirical cumulative 
distribution functions of the quantized independently postprocessed ensemble by 
Fx, . . . ,Fl, respectively, and we use the symbol Em to denote the corresponding 
copula. Then 

(4.5) F(y 1 ,...,y L )=E M (F 1 (y 1 ),...,F L (y L )) 
and 

(4.6) F(y 1 ,...,y L ) = E M (F 1 (y 1 ),...,F L (y L )) 

for all yi,...,2/L G K- As elucidated by equations (4.4), (4.5) and (4.6), the 
quantized independently postprocessed ensemble and the ECC postprocessed en- 
semble share the margins, whereas the raw ensemble and the ECC postprocessed 
ensemble share the copula, as illustrated in Figure 5. In particular, the ECC 
postprocessed ensemble honors and retains the flow-dependent multivariate rank 
dependence structure and bivariate Spearman rank correlation coefficients in the 
raw NWP ensemble output. 

4.3 ECC-Q, ECC-R and ECC-T 

We now discuss options for the generation of the discrete samples (4.2) at the 
quantization stage of the ECC approach. Perhaps the most natural way of ob- 
taining a discrete sample of size M from the postprocessed predictive cumulative 
distribution function Fi is to take equidistant Quantiles of the form 

(ECC-Q) x[ = Ff l (i) , 4 = Ff 1 (1) , . . . , x l M = Ff 1 , 

and we refer to this approach as ECC-Q. Another option is to take a simple 
Random sample of the form 

(ECC-R) x\ = Ff\ Ul ), ... ,x l M = F-^um), 

where u±, . . . , um are independent standard uniform random variates. We refer 
to this latter option as ECC-R. 

Finally, we consider a quantile mapping or transformation approach that gen- 
eralizes a recent proposal by Pinson [2012] in the case of wind vectors. In this 
technique, we adopt the ensemble smoothing approach of Wilks [2002] and fit 
a parametric, continuous cumulative distribution function Si to the raw ensem- 
ble margin R[. We then extract the quantiles from Fi that correspond to the 
percentiles of the raw ensemble values in Si , in that 



(ECC-T) 



,x l M = F-\S l (x l M )). 
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We refer to this Transformation approach for continuous variables as ECC-T. 
Frequently, as in the case of temperature, pressure and the u and v wind vector 
components, S[ can be taken to be normal, with mean equal to the ensemble mean 
and variance equal to the ensemble variance. In the special situation in which Si 
and F[ belong to the same location-scale family, such that Si(x) = G((x — n)/cr) 
and Fi(x) = G((x — jl)/a) for some continuous cumulative distribution function 
G, fi, jl £ M and a, a > 0, the transformation from x to 

(4.7) x = F- 1 (S l (x)) = fi+^(x-fi) 

becomes affine, and thus the ECC-T postprocessed ensemble conserves the raw 
ensemble's bivariate Pearson product moment correlation coefficients, in addition 
to retaining its bivariate Spearman rank correlation coefficients. 

The literature on the quantization of probability distributions provides partial 
theoretical support in favor of the ECC-Q approach [Graf and Luschgy, 2000, 
Examples 4.17 and 5.5], and so does our case study in Section 5.3, where we 
compare the predictive performance of the ECC-Q, ECC-R and ECC-T schemes. 
We therefore recommend the use of the natural ECC-Q approach. 

4.4 Relationships to extant work 

While the broad framework and the interpretation in terms of empirical copulas 
in our paper are original, the idea of the ECC approach is not new, with its recent 
appearances in the literature coming in various, seemingly unrelated shades and 
flavors. In this context, the connections to the work of Pinson [2012] abd Roulin 
and Vannitsem [2012] are of particular interest. 

The method described in Section 2.c of Roulin and Vannitsem [2012] in the 
context of areal precipitation forecasts can be viewed as a variant of the ECC-Q 
scheme, as it extracts equally spaced quantiles from the postprocessed marginal 
predictive cumulative distribution functions, which are of logistic type, followed 
by a reordering with respect to the raw ensemble values, with adaptations to 
account for a point mass at zero. 

Pinson [2012] proposes a transformation technique for the postprocessing of 
ensemble forecasts of wind vector components. In this method, each postpro- 
cessed margin is a translated and dilated version of the original margin, with the 
mapping being compatible with the ECC-T scheme in the special case in which 
both Si and Fi are normal. 

5. CASE STUDY 

In this case study we exemplify the use of statistical postprocessing techniques, 
illustrate and assess the ECC approach, and compare the predictive performance 
of the ECC-Q, ECC-R and ECC-T schemes, respectively. All forecasts are based 
on the 50-member global NWP ensemble managed by the European Centre for 
Medium-Range Weather Forecasts (ECMWF), which operates at a horizontal 
resolution of approximately 32 km and lead times up to ten days ahead [Molteni 
et al., 1996, Leutbecher and Palmer, 2008]. The differences between the ensemble 
members stem from random perturbations in initial conditions and stochastic 
physics parameterizations, and thus the ensemble members are statistically in- 
distinguishable and can be considered as exchangeable. 
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5.1 Setting 

We restrict attention to the ECMWF ensemble run initialized at 00:00 Univer- 
sal Time Coordinated (UTC) and consider forecasts for surface temperature, sea 
level pressure, precipitation and the u and v wind vector components at lead times 
of 24 and 48 hours, with emphasis on the international airports at Berlin- Tegel, 
Frankfurt am Main and Hamburg in Germany, where 00:00 UTC corresponds to 
2:00 am local time in summer and 1:00 am local time in winter. The locations of 
the three airports are marked in the upper left panel in Figure 1. Our test period 
consists of the twelve month period ranging from May 1, 2010 through April 30, 
2011. Forecasts and observations prior to May 1, 2010 are used as training data 
as needed. 

To obtain postprocessed marginal predictive distributions for each weather 
variable, location and lead time individually, we apply the techniques described 
in Section 2. For temperature and pressure, we employ the ensemble BMA model 
(2.3) with a normal kernel, and for precipitation the Bernoulli-gamma ensemble 
BMA model specified in (2.5), (2.6) and (2.7), respectively. For the wind vector 
components, we use the NR model (2.4). To estimate the univariate predictive 
models, we use local data from a rolling training period consisting of the most 
recent available 30 days and employ the techniques proposed by Raftery et al. 
[2005], Sloughter et al. [2007] and Gneiting et al. [2005]. Then we apply the 
ECC-Q, ECC-R and ECC-T schemes as described in Section 4. 

5.2 Evaluation methods 

Statistical postprocessing techniques aim at generating calibrated and sharp 
probabilistic forecasts from NWP ensemble output. As argued by Gneiting et al. 
[2007], the goal in probabilistic forecasting is to maximize the sharpness of the 
predictive distributions subject to calibration. Calibration is a multi-faceted, joint 
property of the forecasts and the observations; essentially, the forecasts are cali- 
brated if the observations can be interpreted as random draws from the predictive 
distributions. Sharpness refers to the concentration of the predictive distributions, 
and thus is a property of the forecasts only. 

In univariate settings, calibration is checked via the probability integral trans- 
form (PIT) or the verification rank. The PIT is simply the value that the predic- 
tive cumulative distribution function attains at the realizing observation [Dawid, 
1984, Gneiting et al., 2007], with suitable adaptations in the case of discrete distri- 
butions [Czado et al., 2009]. For an ensemble forecast, the verification rank is the 
rank of the realizing observation when pooled with the ensemble values [Hamill, 
2001]. When a predictive distribution is calibrated, the PIT or verification rank 
is uniformly distributed. Thus, calibration can be diagnosed by compositing over 
forecast cases, plotting a PIT or verification rank histogram, respectively, and 
checking for deviations from uniformity. Verification rank and PIT histograms 
are directly comparable, with a U-shape indicating underdispersion, an inverse 
U-shape indicating overdispersion, and skew pointing at biases in the predictive 
distributions. 

Proper scoring rules provide decision theoretically coherent numerical measures 
of predictive performance that assess calibration and sharpness simultaneously. 
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Here we use the proper continuous ranked probability score (CRPS), denned by 

/oo 
(F(z)-t{y < z}) 2 dz 
-oo 

(5.2) = E F \X-y\ - -E F \X-X'\, 

where F is a predictive cumulative distribution function with finite first moment, 
y is the verifying observation, and X and X' are independent random variables 
with distribution F [Gneiting and Raftery, 2007]. If F = 5 X is a point measure, 
the continuous ranked probability score reduces to the absolute error, \x — y\. If 
F = Fens is an ensemble forecast with members xi, ... ,xm £ M, we interpret it 
as an empirical measure and compute the continuous ranked probability score as 

^ M MM 

(5.3) crps(F cns ,y) = — ^ \x m - y\ - Y Yl \ x n~x m \. 

m=l n=l m=l 

We furthermore find the absolute error for the point forecast given by the median 
of the predictive distribution, which is the Bayes predictor under this loss function 
[Gneiting, 2011]. Forecasting methods then are compared by averaging scores over 
the test set, with smaller values indicating better predictive performance. 

To assess the calibration of ensemble forecasts of a multivariate quantity, we 
use the multivariate version of the rank histogram described by Gneiting et al. 
[2008]. We also employ the proper energy score, which generalizes the continuous 
ranked probability score in the representation (5.2), and is defined as 

(5.4) es(F,y) = E F \\X - y\\ - ± E F \\X - X'\\, 

where || • || denotes the Euclidean norm, F is a predictive distribution with finite 
first moments, X and X' are independent random vectors with distribution F, 
and y is the verifying observation [Gneiting and Raftery, 2007]. For ensemble 
forecasts the natural analogue of the formula (5.3) applies. If the scales of the 
weather variables vary, the margins should be standardized before computing the 
joint energy score for these variables. In our implementation, this is done using 
the marginal means and standard deviations of the observations in the test set. 

The aforementioned techniques for the evaluation of probabilistic forecasts of 
multivariate quantities have been developed with low dimensional quantities in 
mind [Gneiting et al., 2008], and we apply them in dimension L < 3 only. In 
higher dimension, these methods lose power, and there is a pronounced need for 
the development of theoretically principled evaluation techniques that are tailored 
to such settings [Pinson, 2013, Section 5.2]. 

5.3 Predictive performance for univariate weather quantities 

Table 3 compares the predictive performance of the raw ensemble and the 
postprocessed predictive distributions for temperature, pressure, precipitation 
and the u and v wind vector components at lead times of 24 and 48 hours at 
Berlin, Frankfurt and Hamburg, respectively. The BMA and NR postprocessing 
generally leads to a significant improvement in the predictive skill, as measured 
by the mean CRPS and the MAE, with exceptions in the case of precipitation. 
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Table 3 

Mean continuous ranked probability score (CRPS) and mean absolute error (MAE) for 
univariate forecasts of temperature, pressure, precipitation, and u and v wind components at 
Berlin, Frankfurt and Hamburg, at lead times of 24 and 48 hours, respectively, for a test period 
ranging from May 1, 2010 through April 30, 2011. 
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Fig 10. Calibration checks for 48-hour ahead forecasts of temperature, pressure, precipitation 
and u wind at Frankfurt, for a test period ranging from May 1, 2010 through April 30, 2,011. 
Top: Verification rank histograms for the ECMWF ensemble. Bottom: PIT histograms for BMA 
or NR postprocessed predictive distributions. 
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Not unexpectedly, the performance generally is better at the shorter prediction 
horizon of 24 hours. 

Figure 10 shows verification rank and PIT histograms for temperature, pres- 
sure, precipitation and u wind at a lead time of 48 hours at Frankfurt. The 
postprocessed forecasts show much better calibration, as evidenced by the nearly 
uniform PIT histograms, except perhaps in the case of precipitation, where a 
slight inverse U-shape of the PIT histogram may indicate overdispersion in the 
BMA postprocessed predictive distributions. 

5.4 Predictive performance for multivariate weather quantities 

We now give an illustration and initial evaluation of ECC postprocessed mul- 
tivariate predictive distributions. 

Table 4 and Figure 11 concern temperature and pressure, with each of these 
variables being considered at Berlin, Frankfurt and Hamburg jointly. The distance 
from Frankfurt to either Berlin or Hamburg is on the order of 400 kilometers, and 
the distance between Berlin and Hamburg is approximately 250 kilometers. Wind 
and precipitation patterns vary at considerably smaller spatial scales and we thus 
do not expect ECC to make much of a difference here. In contrast, forecast errors 
for pressure can be expected to show pronounced long range dependencies, and 
perhaps to some lesser extent for temperature. The scores and multivariate rank 
histograms confirm the strongly positive effects of ECC in the case of pressure, 
where the ECC postprocessed trivariate predictive distributions are much better 
calibrated than either the raw ensemble or the independent BMA postprocessed 
predictive distributions. The ECC-Q quantization scheme outperforms the ECC- 
R and ECC-T approaches. 

While for temperature the BMA postprocessing improves strongly on the raw 
ensemble forecast, the effect of ECC is minor, if not negative, due to the cor- 
relations in the forecast errors being negligible at the distances considered here. 
That said, Figure 12 illustrates the strongly positive effects of ECC on temper- 
ature field forecasts, where dependencies at short and moderate distances are 
of critical importance. Here we consider the 1,221 NWP model grid boxes over 
Germany, with the forecast made a day ahead for 2:00 am on April 25, 2011, for 
what promises to be a pleasant, unusually warm spring night. 

The postprocessing uses a single BMA model of the form (2.3), which is trained 
on spatially pooled pairs of ensemble forecasts and corresponding nowcasts from 
the previous 20 days. The unprocessed raw ECMWF ensemble appears to cap- 
ture spatial structure fairly well, but it has an overall negative bias, especially in 
the mountainous Alpes region in the south and in the central east of the country. 
While the BMA postprocessing addresses biases, and the use of a single BMA 
model avoids inconsistencies between the univariate postprocessed predictive dis- 
tributions themselves, the independent samples result in noisy and incoherent 
spatial structure. The ECC postprocessed ensemble inherits the bias-corrected 
marginals from the independent BMA postprocessed forecast and simultaneously 
maintains the L = 1, 221 variate dependence structure in the raw ensemble. The 
nowcast that serves as grid based ground truth is the relevant initialization of 
the ECMWF's so-called control run [Molteni et al., 1996]. 

While these examples concern the spatial case only, ECC is equally well suited 
to handling temporal and cross-variable dependencies, with Figure 5 illustrat- 
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Table 4 

Mean energy score for 48-h ahead forecasts of temperature and pressure, each considered at 
Berlin, Frankfurt and Hamburg jointly, for a test period ranging from May 1, 2010 through 
April 30, 2011. The scores for the independent BMA and ECC-R techniques, which involve 
randomization, are averaged over 100 repetitions. 





Temperature 
(°C) 


Pressure 
(hPa) 


ECMWF 


2.342 


1.478 


BMA 


1.925 


1.477 


ECC-Q 


1.922 


1.429 


ECC-R 


1.945 


1.454 


ECC-T 


1.934 


1.442 



Temperature 







1 11 21 31 41 51 

Multivariate Rank 



1 11 21 31 41 51 

Multivariate Rank 



1 11 21 31 41 51 

Multivariate Rank 



1 11 21 31 41 51 

Multivariate Rank 




21 31 41 51 

Multivariate Rank 



1 11 21 31 41 51 

Multivariate Rank 



Fig 11. Multivariate rank histograms for 48-h ahead ensemble forecasts of temperature and 
pressure, each considered at Berlin, Frankfurt and Hamburg jointly, for a test period ranging 
from May 1, 2010 through April 30, 2011 
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Fig 12. 24-hour ahead ensemble forecasts for temperature over Germany valid 2:00 am on April 
25, 2011, in the unit of degrees Celsius. Top row: Four randomly selected members of the raw 
ECMWF ensemble. Second row: Independent BMA postprocessing — for each grid box, a random 
number from the corresponding BMA postprocessed predictive distribution is drawn. Third row: 
Four randomly selected members of the corresponding ECC ensemble. Bottom row: Nowcast. 
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ing the latter aspect. To generate physically realistic and consistent ensemble 
forecasts of temporal trajectories, constraints can be put on the BMA or NR 
parameters, so that they vary smoothly across lead times, which ensures the tem- 
poral consistency of the postprocessed marginal predictive distributions. Then, 
the ECC approach can be used to account for dependence structures across lead 
times. These settings are being investigated in ongoing work, and we expect to 
report quantitative results in due time. 

6. DISCUSSION 

The intensified attention to the quantification of uncertainty in the output 
of complex simulation models poses major challenges in a vast range of critical 
applications. In this paper, we have introduced the general uncertainty quantifica- 
tion framework of ensemble copula coupling (ECC), which we have illustrated on 
the key example of numerical weather prediction (NWP). The approach is con- 
ceptionally very simple and straightforward to implement in practice. Starting 
from raw ensemble output, ECC employs standard techniques to obtain post- 
processed predictive distributions for each of the univariate margins individually. 
Then we quantize the postprocessed predictive distributions and adopt the rank 
dependence structure of the raw ensemble, as embodied by its empirical copula. 

The defining feature of the ECC approach, namely, the adoption of the rank 
order structure of the raw ensemble, also sets its limitations. The number of 
members in the ECC postprocessed ensemble equals that of the raw ensemble, 
which typically is small, and ECC operates under a perfect model assumption 
with respect to the multivariate rank dependence structure. However, for state of 
the art NWP models such an assumption seems reasonably adequate in practice 
and can be comfirmed by diagnostic checks, as we have illustrated in Figure 8. 

Currently, approaches of the ECC type are being investigated and tested by 
weather centers internationally; see, for example, the recent work of Flowerdew 
[2012], Pinson [2012] and Roulin and Vannitsem [2012]. We applaud these devel- 
opments and call for case studies and quantitative comparisons to the Schaake 
shuffle [Clark et al., 2004], which also admits an empirical copula interpretation. 
In ECC, the multivariate dependence structure of the forecast errors derives from 
the ensemble forecast; in the Schaake shuffle, it derives from a record of histori- 
cal weather observations. Judiciously designed combinations of the ECC and the 
Schaake shuffle approaches might well lead to improved predictive performance. 

If the model output under consideration is low-dimensional or strongly struc- 
tured, parametric copula approaches become available, which may allow for the 
correction of any systematic errors in the ensemble's representation of condi- 
tional dependence structures. Here, the most prominent option lies in the use of 
Gaussian copulas, as in the general approach of Moller et al. [2013] and in the 
temporally or spatially structured settings of Gel et al. [2004], Berrocal et al. 
[2007, 2008] and Pinson et al. [2009] . In such situations, it is to be expected that 
parametric techniques outperform the ECC approach and the Schaake shuffle, 
and comparative studies of the predictive abilities and relative merits of the var- 
ious methods are strongly encouraged. Given its intuitive appeal and simplicity 
of implementation, the ECC approach offers a natural benchmark. 

In Figure 12 we have given an example of how ECC can be used to restore 
spatial consistency in weather field forecasts directly on the model grid. The 
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aforementioned parametric Gaussian approaches of Gel et al. [2004] and Berrocal 
et al. [2007] can achieve this, too, but require elaborate spatial statistical models 
to be fitted. In contrast, the computational and human resources necessitated by 
ECC are nearly negligible, and ECC can also handle temporal and cross- variable 
dependencies, for model output of nearly any dimensionality. 

While we have focused on weather forecasting in this paper, the general frame- 
work of ECC as a multi-stage approach to the quantification of uncertainty in 
the output of complex simulation models with intricate multivariate dependence 
structures is likely to be useful in a vast range of applications. Essentially, ECC 
can be applied whenever an ensemble of simulation runs is available, the ensem- 
ble is capable of realistically representing multivariate dependence structures, 
and training data for the statistical correction of the univariate margins are at 
hand. In this general setting of uncertainty quantification, the goals articulated by 
Gneiting et al. [2007] continue to provide guidance, in that we seek to gauge our 
incomplete knowledge of current, past or future quantities of interest by means of 
joint probability distributions, which ought to be as sharp as possible, subject to 
them being calibrated, in the broad sense of reality being statistically compatible 
with the postprocessed distributions. 

ONLINE SUPPLEMENT 

A dynamic version of Figure 5 is provided in the Online Supplement. The 
ensemble reordering step in the ECC approach is elucidated when switching back 
and forth between pages. 
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